feat(profiles): rollout data compression #92133

john-z-yang · 2025-05-22T16:59:55Z

No description provided.

codecov · 2025-05-22T17:15:26Z

Codecov Report

Attention: Patch coverage is 57.14286% with 3 lines in your changes missing coverage. Please review.

⚠️ Parser warning

The parser emitted a warning. Please review your JUnit XML file:

Warning while parsing testcase attributes: Limit of string is 1000 chars, for name, we got 2083 at 1:156614 in /home/runner/work/sentry/sentry/.artifacts/pytest.junit.xml

Files with missing lines	Patch %	Lines
src/sentry/profiles/consumers/process/factory.py	50.00%	3 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           master   #92133       +/-   ##
===========================================
+ Coverage   40.47%   87.86%   +47.39%     
===========================================
  Files       10145    10179       +34     
  Lines      582238   583484     +1246     
  Branches    22627    22627               
===========================================
+ Hits       235649   512690   +277041     
+ Misses     346137    70342   -275795     
  Partials      452      452

markstory

Looks good to me 👍

markstory · 2025-05-22T18:37:46Z

src/sentry/profiles/consumers/process/factory.py


-        if random.random() < options.get("taskworker.try_compress.profile_metrics"):
+        if random.random() < options.get("taskworker.try_compress.profile_metrics.rollout"):


No need to change this, but in the future you can use sentry.options.rollout.in_random_rollout() for these kinds of checks.

Ah that's good to know, thank you!

Swatinem · 2025-06-03T14:07:58Z

src/sentry/profiles/consumers/process/factory.py

+            b64encoded_compressed = b64encode(
+                zlib.compress(
+                    message.payload.value,


sorry for the drive-by review after the fact:

Is there a specific reason why you went with zlib over zstd, which is an overall better compression algorithm which should be preferred?
And is the base64 encoding a hard requirement because the tasks can’t handle bytes arguments?

It is just really weird that this end up being base64(zlib(base64(msgpack))).
We could just make this zstd(msgpack), or base64(zstd(msgpack)) in case we really can’t have bytes.

base64 inflates the payload size by 33% by definition, and its a bit wasteful to do it twice even.

Is there a specific reason why you went with zlib over zstd, which is an overall better compression algorithm which should be preferred?

zlib was in stdlib, and I didn't know we already had the necessary dependencies for zstandard.

It is just really weird that this end up being base64(zlib(base64(msgpack))).

The current implementation doesn't have a double base64 encode. We do base64 twice so that we can measure the effects of compression, but the task payload should only be base64 encoded once after compression.

And is the base64 encoding a hard requirement because the tasks can’t handle bytes arguments?

Yes, bytes aren't JSON encodable, so we needed a way to get a str.

thanks for clarifying. I got confused by the double-base64, re-reading this again, its clear there is no double-encoding going on :-D

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label May 22, 2025

vercel bot deployed to Preview May 22, 2025 17:01 View deployment

john-z-yang marked this pull request as ready for review May 22, 2025 17:02

john-z-yang requested a review from a team as a code owner May 22, 2025 17:02

feat(profiles): rollout data compression

89cef9a

john-z-yang force-pushed the john/rollout-profile-compression branch from 4ee36cf to 89cef9a Compare May 22, 2025 17:33

vercel bot deployed to Preview May 22, 2025 17:34 View deployment

john-z-yang enabled auto-merge (squash) May 22, 2025 17:57

markstory approved these changes May 22, 2025

View reviewed changes

john-z-yang merged commit 6dec121 into master May 22, 2025
59 checks passed

john-z-yang deleted the john/rollout-profile-compression branch May 22, 2025 18:43

andrewshie-sentry pushed a commit that referenced this pull request Jun 2, 2025

feat(profiles): rollout data compression (#92133)

75bb715

Swatinem reviewed Jun 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(profiles): rollout data compression #92133

feat(profiles): rollout data compression #92133

Uh oh!

john-z-yang commented May 22, 2025

Uh oh!

codecov bot commented May 22, 2025 •

edited

Loading

Uh oh!

markstory left a comment

Uh oh!

markstory May 22, 2025

Uh oh!

john-z-yang May 22, 2025

Uh oh!

Uh oh!

Swatinem Jun 3, 2025

Uh oh!

markstory Jun 3, 2025

Uh oh!

Swatinem Jun 3, 2025

Uh oh!

Uh oh!


		if random.random() < options.get("taskworker.try_compress.profile_metrics"):
		if random.random() < options.get("taskworker.try_compress.profile_metrics.rollout"):

Uh oh!

feat(profiles): rollout data compression #92133

feat(profiles): rollout data compression #92133

Uh oh!

Conversation

john-z-yang commented May 22, 2025

Uh oh!

codecov bot commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

⚠️ Parser warning

Uh oh!

markstory left a comment

Choose a reason for hiding this comment

Uh oh!

markstory May 22, 2025

Choose a reason for hiding this comment

Uh oh!

john-z-yang May 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Swatinem Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

markstory Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

Swatinem Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented May 22, 2025 •

edited

Loading